defining plotting function (bad):

loading catalogue and usa policitians mentions:

selecting only the main democratic and republicans parties (with more than 10 politicians in the catalogue, may need to filter more to be more exhaustive?) (for each one checked on wikipedia if "affiliated to national Democratic/Republican party"). Keeping only the first party from the catalogue parties list, not sure about this.

create men_df, dataframe of single mentions where the mentioneer and mentioned politicians are in the catalogue.:

create the adjacency table:

build graph from adjacency matrix:

compute kernighan_lin_bisection of the graph. Uses only undirected graph but for communities doesn't matter. paper link

plotting 20 most central nodes of each communities together:

plotting the whole graph (no sense really)

most central nodes and nodes degree plots:

compute catalogue keeping only qids in network, and setting binary_party and binary_community columns:

compute Spearman correlation coefficient and pval. Spearman correlation makes sense here because we want to capture if the 2 variables are correlated or anti-correlated, because we don't control wether which binary value will be applied to which community (all same => Spearman coeff=1, all contrary => Spearman coeff=-1 and we take the absolute value). We don't want to just check similarity because it doesn't capture anti-correlation. wikipedia link